BUG: Fix ne comparison for Categorical #32304

dsaxton · 2020-02-27T18:25:01Z

closes Categorical NaN behaviour different from a str #32276
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

dsaxton · 2020-02-27T18:31:01Z

pandas/core/arrays/categorical.py

@@ -103,7 +103,10 @@ def func(self, other):
            mask = (self._codes == -1) | (other_codes == -1)
            if mask.any():
                # In other series, the leads to False, so do that here too
-                ret[mask] = False
+                if opname == "__ne__":
+                    ret[mask & (self._codes == other_codes)] = True


This redundancy was intentional and an attempt to "reuse" the mask calculation above since I think it may short-circuit when mask is False, but could just make it (self._codes == -1) & (other_codes == -1) as well. I also think __ne__ is the only situation that'd have to be special-cased here out of the comparison operators. Also worth pointing out that this is assuming we're dealing with NaN rather than NA.

(self._codes == -1) & (other_codes == -1)

I think this would be more clear. Also there might be a cached _isnan attribute for this

@dsaxton im working on cleaning up these comparisons, am confused by this line. can we make this use the more common pattern

fill_value = True if op is operator.ne else False [...] ret = op(self._codes, other_codes) mask = (self._codes == -1) | (other_codes == -1) ret[mask] = fill_value

@jbrockmendel That is much simpler and seems correct to me

jbrockmendel · 2020-02-27T21:09:27Z

pandas/tests/extension/test_categorical.py

@@ -282,6 +282,16 @@ def _compare_other(self, s, data, op_name, other):
            with pytest.raises(TypeError, match=msg):
                op(data, other)

+    def test_not_equal_with_na(self):


do we get this right in cases with e.g. datetime64 or datetime64tz categories?

It seems to work, added some parameterization over other category types

jreback · 2020-03-03T03:28:46Z

thanks @dsaxton

dsaxton added 4 commits February 27, 2020 10:14

Special case __ne__

cd44bf4

Test

3c621fb

Doc

b3adfe2

Reference self

70f7ebb

dsaxton commented Feb 27, 2020

View reviewed changes

Blacken

0087a53

jbrockmendel reviewed Feb 27, 2020

View reviewed changes

dsaxton added 3 commits February 27, 2020 16:37

Don't use mask

94d6238

Param over categories

888f7cf

Merge remote-tracking branch 'upstream/master' into noteq-cat

be39621

jreback added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 3, 2020

jreback added this to the 1.1 milestone Mar 3, 2020

jreback merged commit 821aa25 into pandas-dev:master Mar 3, 2020

dsaxton deleted the noteq-cat branch April 9, 2020 02:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix ne comparison for Categorical #32304

BUG: Fix ne comparison for Categorical #32304

dsaxton commented Feb 27, 2020 •

edited

Loading

dsaxton Feb 27, 2020

jbrockmendel Feb 27, 2020

jbrockmendel Sep 9, 2020

dsaxton Sep 9, 2020

jbrockmendel Feb 27, 2020

dsaxton Feb 27, 2020

jreback commented Mar 3, 2020

BUG: Fix __ne__ comparison for Categorical #32304

BUG: Fix __ne__ comparison for Categorical #32304

Conversation

dsaxton commented Feb 27, 2020 • edited Loading

dsaxton Feb 27, 2020

Choose a reason for hiding this comment

jbrockmendel Feb 27, 2020

Choose a reason for hiding this comment

jbrockmendel Sep 9, 2020

Choose a reason for hiding this comment

dsaxton Sep 9, 2020

Choose a reason for hiding this comment

jbrockmendel Feb 27, 2020

Choose a reason for hiding this comment

dsaxton Feb 27, 2020

Choose a reason for hiding this comment

jreback commented Mar 3, 2020

BUG: Fix ne comparison for Categorical #32304

BUG: Fix ne comparison for Categorical #32304

dsaxton commented Feb 27, 2020 •

edited

Loading